Towards Finite-Sample Convergence of Direct Reinforcement Learning
نویسندگان
چکیده
While direct, model-free reinforcement learning often performs better than model-based approaches in practice, only the latter have yet supported theoretical guarantees for finite-sample convergence. A major difficulty in analyzing the direct approach in an online setting is the absence of a definitive exploration strategy. We extend the notion of admissibility to direct reinforcement learning and show that standard Q-learning with optimistic initial values and constant learning rate is admissible. The notion justifies the use of a greedy strategy that we believe performs very well in practice and holds theoretical significance in deriving finite-sample convergence for direct reinforcement learning. We present empirical evidence that supports our idea.
منابع مشابه
A Convergent Reinforcement Learning Algorithm in the Continuous Case: The Finite-Element Reinforcement Learning
This paper presents a direct reinforcement learning algorithm, called Finite-Element Reinforcement Learning, in the continuous case, i.e. continuous state-space and time. The evaluation of the value function enables the generation of an optimal policy for reinforcement control problems, such as target or obstacle problems, viability problems or optimization problems. We propose a continuous for...
متن کاملHierarchical Model-Based Reinforcement Learning: R-MAX + MAXQ
Hierarchical decomposition promises to help scale reinforcement learning algorithms naturally to real-world problems by exploiting their underlying structure. Model-based algorithms, which provided the first finite-time convergence guarantees for reinforcement learning, may also play an important role in coping with the relative scarcity of data in large environments. In this paper, we introduc...
متن کاملKernel-Based Models for Reinforcement Learning
Model-based approaches to reinforcement learning exhibit low sample complexity while learning nearly optimal policies, but they are generally restricted to finite domains. Meanwhile, function approximation addresses continuous state spaces but typically weakens convergence guarantees. In this work, we develop a new algorithm that combines the strengths of Kernel-Based Reinforcement Learning, wh...
متن کاملConvergence of reinforcement learning to Nash equilibrium: A search-market experiment
Since the introduction of Reinforcement Learning (RL) in Game Theory, a growing literature is concerned with the theoretical convergence of RL-driven outcomes towards Nash equilibrium. In this paper, we apply this issue to a search-theoretic framework (posted-price market) where sellers are confronted with a population of imperfectly informed buyers and take one decision per period (posted pric...
متن کاملFinite sample analysis of the GTD Policy Evaluation Algorithms in Markov Setting
In reinforcement learning (RL) , one of the key components is policy evaluation, which aims to estimate the value function (i.e., expected long-term accumulated reward) of a policy. With a good policy evaluation method, the RL algorithms will estimate the value function more accurately and find a better policy. When the state space is large or continuous Gradient-based Temporal Difference(GTD) ...
متن کامل